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(N 



Traditional sampling theories consider the problem of reconstructing an unknown signal x from a series of 

samples. A prevalent assumption which often guarantees recovery from the given measurements is that x lies in 

a known subspace. Recently, there has been growing interest in nonlinear but structured signal models, in which 

C^ . X lies in a union of subspaces. In this paper we develop a general framework for robust and efficient recovery 

. of such signals from a given set of samples. More specifically, we treat the case in which x lies in a sum of k 

r ^ I subspaces, chosen from a larger set of m possibilities. The samples are modelled as inner products with an arbitrary 

P^ ■ set of sampling functions. To derive an efficient and robust recovery algorithm, we show that our problem can be 

formulated as that of recovering a block-sparse vector whose non-zero elements appear in fixed blocks. We then 

propose a mixed ^2/^1 program for block sparse recovery. Our main result is an equivalence condition under which 

(N 

^ ' the proposed convex algorithm is guaranteed to recover the original signal. This result relies on the notion of block 

QQ I restricted isometry property (RIP), which is a generalization of the standard RIP used extensively in the context of 

in ■ 

T- 1- ■ compressed sensing. Based on RIP we also prove stability of our approach in the presence of noise and modelling 

C^ ' errors. A special case of our framework is that of recovering multiple measurement vectors (MMV) that share a joint 

00 . sparsity pattern. Adapting our results to this context leads to new MMV recovery methods as well as equivalence 



conditions under which the entire set can be determined efficiently. 



cd ' I. Introduction 

Sampling theory has a rich history dating back to Cauchy. Undoubtedly, the sampling theorem that had the most 
impact on signal processing and communications is that associated with Whittaker, Kotelriikov, and Shannon [1], 
[2]. Their famous result is that a bandlimited function x{t) can be recovered from its uniform samples as long as 
the sampling rate exceeds the Nyquist rate, corresponding to twice the highest frequency of the signal [3]. More 
recently, this basic theorem has been extended to include more general classes of signal spaces. In particular, it 
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can be shown that under mild technical conditions, a signal x lying in a given subspace can be recovered exactly 
from its linear generalized samples using a series of filtering operations [4]-[7]. 

Recently, there has been growing interest in nonlinear signal models in which the unknown x does not necessarily 
lie in a subspace. In order to ensure recovery from the samples, some underlying structure is needed. A general 
model that captures many interesting cases is that in which x lies in a union of subspaces. In this setting, x resides 
in one of a set of given subspaces Vj, however, a priori it is not known in which one. A special case of this 
framework is the problem underlying the field of compressed sensing (CS), in which the goal is to recover a length 
N vector x from n < N linear measurements, where x has no more than k non-zero elements in some basis [8], 
[9]. Many algorithms have been proposed in the literature in order to recover x in a stable and efficient manner 
[9]-[12]. A variety of conditions have been developed to ensure that these methods recover x exactly. One of the 
main tools in this context is the restricted isometry property (RIP) [9], [13], [14]. In particular, it can be shown 
that if the measurement matrix satisfies the RIP then x can be recovered by solving an ii minimization algorithm. 

Another special case of a union of subspaces is the setting in which the unknown signal x = x{t) has a multiband 
structure, so that its Fourier transform consists of a limited number of bands at unknown locations [15], [16]. By 
formulating this problem within the framework of CS, explicit sub-Nyquist sampling and reconstruction schemes 
were developed in [15], [16] that ensure perfect-recovery at the minimal possible rate. This setup was recently 
generalized in [17], [18] to deal with sampling and reconstruction of signals that lie in a finite union of shift- 
invariant subspaces. By combining ideas from standard sampling theory with CS results [19], explicit low-rate 
sampling and recovery methods were developed for such signal sets. Another example of a union of subspaces 
is the set of finite rate of innovation signals [20], [21], that are modelled as a weighted sum of shifts of a given 
generating function, where the shifts are unknown. 

In this paper, our goal is to develop a unified framework for efficient recovery of signals that lie in a structured 
union of subspaces. Our emphasis is on computationally efficient methods that are stable in the presence of noise 
and modelling errors. In contrast to our previous work [15]-[18], here we consider unions of finite-dimensional 
subspaces. Specifically, we restrict our attention to the case in which x resides in a sum of k subspaces, chosen 
from a given set of m subspaces Aj,l < j < m. However, which subspaces comprise the sum is unknown. This 
setting is a special case of the more general union model considered in [22], [23]. Conditions under which unique 
and stable sampling are possible were developed in [22], [23]. However, no concrete algorithm was provided to 
recover such a signal from a given set of samples in a stable and efficient manner. Here we propose a convex 
optimization algorithm that will often recover the true underlying x, and develop explicit conditions under which 
perfect recovery is guaranteed. Furthermore, we prove that our method is stable and robust in the sense that the 
reconstruction error is bounded in the presence of noise and mismodelling, namely when x does not lie exactly in 



the union. Our results rely on a generalization of the RIP which fits the union setting we treat here. 

Our first contribution is showing that the problem of recovering x in a structured union of subspaces can be 
cast as a sparse recovery problem, in which it is desired to recover a sparse vector c that has a particular sparsity 
pattern: the non-zero values appear in fixed blocks. We refer to such a model as block sparsity. Clearly any block- 
sparse vector is also sparse in the standard sense. However, by exploiting the block structure of the sparsity pattern, 
recovery may be possible under more general conditions. 

Next, we develop a concrete algorithm to recover a block-sparse vector from given measurements, which is based 
on minimizing a mixed ^2/^1 norm. This problem can be cast as a convex second order cone program (SOCP), and 
solved efficiently using standard software packages. A mixed norm approach for block-sparse recovery was also 
considered in [24], [25]. By analyzing the measurement operator's null space, it was shown that asymptotically, as 
the signal length grows to infinity, and under ideal conditions (no noise or modeling errors), perfect recovery is 
possible with high probability. However, no robust equivalence results were established between the output of the 
algorithm and the true block-sparse vector for a given finite-length measurement vector, or in the presence of noise 
and mismodelling. 

Generalizing the concept of RIP to our setting, we introduce the block RIP, which is a less stringent requirement. 
We then prove that if the measurement matrix satisfies the block RIP, then our proposed convex algorithm will 
recover the underlying block sparse signal. Furthermore, under block RIP, our algorithm is stable in the presence 
of noise and mismodelling errors. Using ideas similar to [12], [26] we then prove that random matrices satisfy 
the block RIP with overwhelming probability. Moreover, the probability to satisfy the block RIP is substantially 
larger than that of satisfying the standard RIP. These results establish that a signal x that lies in a finite structured 
union can be recovered efficiently and stably with overwhelming probability if a certain measurement matrix is 
constructed from a random ensemble. 

An interesting special case of the block-sparse model is the multiple measurement vector (MMV) problem, in 
which we have a set of unknown vectors that share a joint sparsity pattern. MMV recovery algorithms were studied 
in [19], [27]-[30]. Equivalence results based on mutual coherence for a mixed ip/ii program were derived in [28]. 
These results turn out to be the same as that obtained from a single measurement problem. This is in contrast to 
the fact that in practice, MMV methods tend to outperform algorithms that treat each of the vectors separately. In 
order to develop meaningful equivalence results, we cast the MMV problem as one of block-sparse recovery. Our 
mixed £2/^1 method translates into minimizing the sum of the £2 row-norms of the unknown matrix representing 
the MMV set. Our general results lead to RIP-based equivalence conditions for this algorithm. Furthermore, our 
framework suggests a different type of sampling method for MMV problems which tends to increase the recovery 
rate. The equivalence condition we obtain in this case is stronger than the single measurement setting. As we show. 



this method leads to superior recovery rate when compared with other popular MMV algorithms. 

The remainder of the paper is organized as follows. In Section JI] we describe the general problem of sampling 
from a union of subspaces. The relationship between our problem and that of block-sparse recovery is developed 
in Section Hill In Section JV] we explore stability and uniqueness issues which leads to the definition of block RIP. 
We also present a non-convex optimization algorithm with combinatorial complexity whose solution is the true 
unknown x. A convex relaxation of this algorithm is proposed in Section jV] We then derive equivalence conditions 
based on block RIP. The concept of block RIP is further used to establish robustness and stability of our algorithm 
in the presence of noise and modelling errors. This approach is specialized to MMV sampling in Section|Vll Finally, 
in Section IVIII we prove that random ensembles tend to satisfy the block RIP with high probability. 

Throughout the paper, we denote vectors in an arbitrary Hilbert space 7i by lower case letters e.g., x, and sets 
of vectors in Ti by calligraphic letters, e.g., S. Vectors in M^ are written as boldface lowercase letters e.g., x, and 
matrices as boldface uppercase letters e.g., A. The identity matrix of appropriate dimension is written as I or I^ 
when the dimension is not clear from the context, and A^ is the transpose of the matrix A. The zth element of a 
vector X is denoted by x(i). Linear transformations from M" to H are written as upper case letters A : W^ ^ TC. 



The adjoint of A is written as A*. The standard Euclidean norm is denoted ||x||2 = Vx^x and ||x||i = ^- |x(i)| 
is the £i norm of x. The Kronecker product between matrices A and B is denoted A B. The following variables 
are used in the sequel: n is the number of samples, N is the length of the input signal x when it is a vector, k is 
the sparsity or block sparsity (to be defined later on) of a vector c, and m is the number of subspaces. For ease of 
notation we assume throughout that all scalars are defined over the field of real numbers; however, the results are 
also valid over the complex domain with appropriate modifications. 

II. Union of Subspaces 

A. Subspace Sampling 

Traditional sampling theory deals with the problem of recovering an unknown signal x ^ Ti from a set of n 
samples in = fi{x) where fi{x) is some function of x. The signal x can be a function of time x = x{t), or can 
represent a finite-length vector x = x. The most common type of sampling is linear sampling in which 

yi = {si,x), l<i<n, (1) 

for a set of functions Si ^ Ti [4], [31]-[37]. Here {x, y) denotes the standard inner product on Ti. For example, if 
Ti = L2 is the space of real finite-energy signals then 

{x,y)= / x{t)y{t)dt. (2) 



When n = M^ for some N, 

N 
(x,y)=^x(i)y(i). (3) 

i=l 

Nonlinear sampling is treated in [38]. However, here our focus will be on the linear case. 

When H = M^ the unknown x = x as well as the sampling functions Sj = Sj are vectors in M^. Therefore, the 
samples can be written conveniently in matrix form as y = S^x, where S is the matrix with columns Sj. In the 
more general case in which H = L2 or any other abstract Hilbert space, we can use the set transformation notation 
in order to conveniently represent the samples. A set transformation S :M."' ^ TC corresponding to sampling vectors 
{si e7i,l < i < n} is defined by 

n 

Sc = J2<i)si (4) 

for all c e M". From the definition of the adjoint, if c = S*x, then c(i) = (sj, x). Note that when TC = R^ , S = S 
and S* = S^. Using this notation, we can always express the samples as 

y = s*x, (5) 

where 5" is a set transformation for arbitrary TC, and an appropriate matrix when TC = R^. 

Our goal is to recover x from the samples y G M". If the vectors Sj do not span the entire space TC, then there 
are many possible signals x consistent with y. More specifically, if we define by S the sampling space spanned 
by the vectors s,;, then clearly 5*^ = for any v € 5-*-. Therefore, if 5-*- is not the trivial space then adding such 
a vector v to any solution a; of ([5]) will result in the same samples y. However, by exploiting prior knowledge on 
X, in many cases uniqueness can be guaranteed. A prior very often assumed is that x lies in a given subspace A 
of TC [4]-[7]. If A and S have the same finite dimension, and 5-*- and A intersect only at the vector, then x can 
be perfectly recovered from the samples y [6], [7], [39]. 

B. Union of Subspaces 

When subspace information is available, perfect reconstruction can often be guaranteed. Furthermore, recovery 
can be implemented by a simple linear transformation of the given samples ([5]). However, there are many practical 
scenarios in which we are given prior information about x that is not necessarily in the from of a subspace. One 
such case studied in detail in [39] is that in which x is known to be smooth. Here we focus our attention on the 
setting where x lies in a union of subspaces 

U = [_]Vi (6) 

i 

where each Vj is a subspace. Thus, x belongs to one of the Vj, but we do not know a priori to which one [22], [23]. 
Note that the set U is no longer a subspace. Indeed, if Vi is, for example, a one-dimensional space spanned by the 



vector Vj, then U contains vectors of the form avj for some i but does not include their linear combinations. Our 
goal is to recover a vector x lying in a union of subspaces, from a given set of samples. In principle, if we knew 
which subspace x belonged to, then reconstruction can be obtained using standard sampling results. However, here 
the problem is more involved because conceptually we first need to identify the correct subspace and only then can 
we recover the signal within the space. 

Previous work on sampling over a union focused on invertibility and stability results [22], [23]. In contrast, 
here, our main interest is in developing concrete recovery algorithms that are provably robust. To achieve this goal, 
we limit our attention to a subclass of Q for which stable recovery algorithms can be developed and analyzed. 
Specifically, we treat the case in which each Vj has the additional structure 

H = A, (7) 

\j\=k 

where {Aj, 1 < j < m} are a given set of disjoint subspaces, and | j| = k denotes a sum over k indices. Thus, each 
subspace Vj corresponds to a different choice of k subspaces Aj that comprise the sum. We assume throughout 
the paper that m and the dimensions di = dim(^j) of the subspaces Ai are finite. Given n samples 

y = s*x (8) 

and the knowledge that x lies in exactly one of the subspaces Vj, we would like to recover the unknown signal x. 
In this setting, there are (™) possible subspaces comprising the union. 

An alternative interpretation of our model is as follows. Given an observation vector y, we seek a signal x for 
which y = S*x and in addition x can be written as 

k 



Y^Xi, (9) 



X = 

i=l 

where each Xi lies in Aj for some index j. 

A special case is the standard CS problem in which x = x is a vector of length A^, that has a sparse representation 
in a given basis defined by an invertible matrix W. Thus, x = Wc where c is a sparse vector that has at most k 
nonzero elements. This fits our framework by choosing Ai as the space spanned by the ith column of W. In this 
setting m = N, and there are ( ^ ) subspaces comprising the union. 

Another example is the block sparsity model [24], [40] in which x is divided into equal-length blocks of size d, 
and at most k blocks can be non zero. Such a vector can be described in our setting with Ti = M^ by choosing 
Ai to be the space spanned by the corresponding i columns of the identity matrix. Here m = N/d and there are 
{ I) subspaces in the union. 

A final example is the MMV problem [19], [27]-[30] in which our goal is to recover a matrix X from 



measurements Y = MX, for a given sampling matrix M. The matrix X is assumed to have at most k non- 
zero rows. Thus, not only is each column Xj fc-sparse, but in addition the non-zero elements of Xj share a joint 
sparsity pattern. This problem can be transformed into that of recovering a A;-block sparse signal by stacking the 
rows of X and Y, leading to the relationship 

vec(Y^) = (M ® I) vec(X^). (10) 

The structure of X leads to a vector vec(X^) that is fe-block sparse. 

C. Problem Formulation and Main Results 

Given k and the subspaces Ai, we would like to address the following questions: 

1) What are the conditions on the sampling vectors Si,l < i < n in order to guarantee that the sampling is 
invertible and stable? 

2) How can we recover the unique x (regardless of computational complexity)? 

3) How can we recover the unique x in an efficient and stable manner? 

The first question was addressed in [22], [23] in the more general context of unions of spaces (without requiring 
a particular structure such as ([7])). However, no concrete methods were proposed in order to recover x. Here we 
provide efficient convex algorithms that recover a; in a stable way for arbitrary k under appropriate conditions on 
the sampling functions s-i and the spaces Ai. 

Our results are based on an equivalence between the union of subspaces problem assuming ^ and that of 
recovering block-sparse vectors. This allows us to recover x from the given samples by first treating the problem 
of recovering a block fc-sparse vector c from a given set of measurements. This relationship is established in 
the next section. In the reminder of the paper we therefore focus on the block fc-sparse model and develop our 
results in that context. In particular, we introduce a block RIP condition that ensures uniqueness and stability of 
our sampling problem. We then suggest an efficient convex optimization problem which approximates an unknown 
block-sparse vector c. Based on block RIP we prove that c can be recovered exactly in a stable way using the 
proposed optimization program. Furthermore, in the presence of noise and modeling errors, our algorithm can 
approximate the best block-A; sparse solution. 

III. Connection with Block Sparsity 

Consider the model of a signal x in the union of k out of m subspaces Ai, with di = dim(^j) as in Q and 
^. To write x explicitly, we choose a basis for each Ai. Denoting by Ai : M"'" -^ H the set transformation 



corresponding to a basis for Ai, any such x can be written as 

x=^AiCi, (11) 

\i\=k 

where Cj G M* are the representation coefficients in Ai, and |i| = k denotes a sum over a set of k indices. The 
choice of indices depend on the signal x and are unknown in advance. 

To develop the equivalence with block sparsity, it is useful to introduce some further notation. First, we define 
A : M^ — > "H as the set transformation that is a result of concatenating the different Ai, with 



N = J2di. (12) 



Next, we define the ith sub-block c[i] of a length- A^ vector c over X = {di, . . . ,dm}- The ith sub-block is of 
length di, and the blocks are formed sequentially so that 

C^ = [Ci ... Cd, ... CN-d„,+l ■ ■ ■ cat]^. (13) 

^ V ' ^ V ' 

c[l] c[m] 

We can then define A by 

m 

Ac = Y,A^c\i]. (14) 

4 = 1 

When ?{ = M^ for some A^, Ai = Aj is a matrix and A = A is the matrix obtained by column-wise concatenating 
Aj. If for a given x the jth subspace Aj does not appear in the sum Q, or equivalently in ([TT]) . then c[j] = 0. 

Any x in the union (|6]l, (O can be represented in terms of k of the bases Ai. Therefore, we can write x = Ac 
where there are at most k non-zero blocks c[i]. Consequently, our union model is equivalent to the model in which 
X is represented by a sparse vector c in an appropriate basis. However, the sparsity pattern here has a unique form 
which we will exploit in our conditions and algorithms: the non-zero elements appear in blocks. 

Definition 1: A vector c G M^ is called block /c-sparse over Z = {di, . . . , dm} if c[i] is nonzero for at most k 
indices i where A^ = J2i ^i- 
An example of a block-sparse vector with k = 2 is depicted in Fig. [T] When di = I for each i, block sparsity 



^ ^ ^ ^ ■<— 



di = 3 d2 =4 d2 = 2 d4 = 6 rfs = 1 

Fig. 1. A block-sparse vector c over I — {di, . . . ,^5}. The gray areas represent 10 non-zero entries which occupy two blocks. 

reduces to the conventional definition of a sparse vector. Denoting 

m 

||c||o,x = j;/(||c[i]||2>0), (15) 

j=l 



where /(||c[z]||2 > 0) is an indicator function that obtains the value 1 if ||c[i]||2 > and otherwise, a block 
fc-sparse vector c can be defined by ||c||o,j < k. 

Evidently, there is a one-to-one correspondence between a vector x in the union, and a block-sparse vector c. 
The measurements © can also be represented explicitly in terms of c as 

y = S*x = S*Ac = Dc, (16) 

where D is the n x N matrix defined by 

D = S*A. (17) 

We can therefore phrase our problem in terms of D and c as that of recovering a block-fe sparse vector c over I 
from the measurements ( fT6l ). 

Note that the choice of basis Ai for each subspace does not affect our model. Indeed, choosing alternative bases 
will lead to x = ylWc where W is a block diagonal matrix with blocks of size di. Defining c = Wc, the block 
sparsity pattern of c is equal to that of c. 

Since our problem is equivalent to that of recovering a block sparse vector over I from linear measurements 
y = Dc, in the reminder of the paper we focus our attention on this problem. 

IV. Uniqueness and Stability 

In this section we study the uniqueness and stability of our sampling method. These properties are intimately 
related to the RIP, which we generalize here to the block-sparse setting. 

The first question we address is that of uniqueness, namely conditions under which a block-sparse vector c is 
uniquely determined by the measurement vector y = Dc. 

Proposition 1: There is a unique block- /c sparse vector c consistent with the measurements y = Dc if and only 
if Dc 7^ for every c / that is block 2k-spaise. 

Proof: The proof follows from [22, Proposition 4]. ■ 

We next address the issue of stability. A sampling operator is stable for a set T if and only if there exists 
constants a > 0, /3 < oo such that 

Ci\\xi - X2\\n < \\S*Xi - 5*X2||2 < P\\xi - X2\\'ii, (1^) 

for every xi, X2 in T. The ratio k = [3 /a provides a measure for stability of the sampling operator. The operator is 
maximally stable when k = 1. In our setting, S* is replaced by D, and the set T contains block-A; sparse vectors. 
The following proposition follows immediately from (fTSl ) by noting that given two block-/c sparse vectors ci,C2 
their difference ci — C2 is block-2/i; sparse. 
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Proposition 2: The measurement matrix D is stable for every block fc-sparse vector c if and only if there exists 
Ci > and C2 < oo such that 

Ci||v||^< ||Dv||^<C2||v||2, (19) 

for every v that is block 2A;-sparse. 

It is easy to see that if D satisfies (f79l ) then Dc / for all block 2A;-sparse vectors c. Therefore, this condition 

implies both invertibility and stability. 



A. Block RIP 



Property ( [T9b is related to the RIP used in several previous works in CS [9], [13], [14]. A matrix D of size 
n X A*" is said to have the RIP if there exists a constant 6k G [0, 1) such that for every A;-sparse c e M^, 



(l-5fe)||c||i<||Dc||i<(l + <5fc)||c||i. 



(20) 



Extending this property to block-sparse vectors leads to the following definition: 

Definition 2: Let D : M^ -^ M" be a given matrix. Then D has the block RIP over X = {di, . . . , dm} with 
parameter 5uj if for every c G M^ that is block A;-sparse over X we have that 

{l-6k\x)Ml<\\Dc\\l<{l + 6k\x)\\c\\l (21) 

By abuse of notation, we use 5k for the block-RIP constant 5k\i when it is clear from the context that we refer to 
blocks. Block-RIP is a special case of the ^-restricted isometry defined in [23]. From Proposition [U it follows that 
if D satisfies the RIP ((2TI) with 62k < 1> then there is a unique block-sparse vector c consistent with (IT6l ). 

Note that a block /c-sparse vector over X is M-sparse in the conventional sense where M is the sum of the k 
largest values in X, since it has at most M nonzero elements. If we require D to satisfy RIP for all M-sparse 
vectors, then (1211 ) must hold for all 2M-sparse vectors c. Since we only require the RIP for block sparse signals, 
(I2TI ) only has to be satisfied for a certain subset of 2M-sparse signals, namely those that have block sparsity. As 
a result, the block-RIP constant 5k\i is typically smaller than 5m (where M depends on k; for blocks with equal 
size d, M = kd). 

To emphasize the advantage of block RIP over standard RIP, consider the following matrix, separated into three 
blocks of two columns each: 



D 



l-l 1 








l\ 


2 


-1 





3 


3 





-1 


1 


V 1 








-1 V 



B, 



(22) 



11 

where B is a diagonal matrix that results in unit-norm columns of D, i.e., B = diag (1, 15, 1, 1, 1, 12)^^/^. In this 
example m = 3 and T = {di = 2,d2 = 2, da = 2}. Suppose that c is block- 1 sparse, which corresponds to at most 
two non-zero values. Brute-force calculations show that the smallest value of 62 satisfying the standard RIP (|20l ) is 
62 = 0.866. On the other hand, the block-RIP (|2TI ) corresponding to the case in which the two non-zero elements 
are restricted to occur in one block is satisfied with 6i\j = 0.289. Increasing the number of non-zero elements to 
/c = 4, we can verify that the standard RIP (l20l ) does not hold for any ^4 G [0, 1). Indeed, in this example there 
exist two 4-sparse vectors that result in the same measurements. In contrast, 62\j = 0.966 satisfies the lower bound 
in (I2TI ) when restricting the 4 non-zero values to two blocks. Consequently, the measurements y = Dc uniquely 
specify a single block-sparse c. 

In the next section, we will see that the ability to recover c in a computationally efficient way depends on 
the constant 62k\x i" the block RIP (|2TI ). The smaller the value of d2k\x, the fewer samples are needed in order 
to guarantee stable recovery. Both standard and block RIP constants dk,6f^\j are by definition increasing with k. 
Therefore, it was suggested in [12] to normalize each of the columns of D to 1, so as to start with 61 = 0. In the 
same spirit, we recommend choosing the bases for Ai such that D = S*A has unit-norm columns, corresponding 
to 6i\j = 0. 

B. Recovery Method 

We have seen that if D satisfies the RIP (|2TI ) with d2k < 1. then there is a unique block-sparse vector c consistent 
with (IT6l ). The question is how to find c in practice. Below we present an algorithm that will in principle find the 
unique c from the samples y. Unfortunately, though, it has exponential complexity. In the next section we show 
that under a stronger condition on S2k we can recover c in a stable and efficient manner. 

Our first claim is that c can be uniquely recovered by solving the optimization problem 

min ||c||ox 

c ' 

s.t. y = Dc. (23) 

To show that (1231 ) will indeed recover the true value of c, suppose that there exists a c' such that Dc' = y and 
||c'||o,j < l|c||o,J < k. Since both c,c' are consistent with the measurements, 

= D(c-c') = Dd, (24) 

where ||d||o,j < 2k so that d is a block 2A;-sparse vector. Since D satisfies (|2TI ) with 52k < 1, we must have that 
d = or c = c'. 
In principle (1231) can be solved by searching over all possible sets of k blocks whether there exists a c that is 



12 

consistent with the measurements. The invertibility condition (|2TI ) ensures that there is only one such c. However, 
clearly this approach is not efficient. 

V. Convex Recovery Algorithm 
A. Noise-Free Recovery 

We now develop an efficient convex optimization problem instead of ( |23] ) to approximate c. As we show, if D 
satisfies (|2TI) with a small enough value of 62k, then the method we propose will recover c exactly. 

Our approach is to minimize the sum of the energy of the blocks c[i]. To write down the problem exphcitly, we 
define the mixed ^2/^1 norm over the index set T = {di, . . . , dm} as 

m 



\chi = Y.\\c[i]h- (25) 



i=l 



The algorithm we suggest is then 

min ||c||2J 

c ' 

s.t. y = Dc. (26) 

Problem (l26l ) can be written as an SOCP by defining tj = ||c[i]||2. Then (l26l ) is equivalent to 





m 


min 

C,ti 


L'- 




i=l 


s.t. 


y = Dc 




ti> ||c[i]||2, l<i<m 




ti >0, 1 < i < m, 



(27) 

which can be solved using standard software packages. 

The next theorem establishes that the solution to (l26l) is the true c as long as 62k is small enough. 

Theorem 1: Let y = Dcq be measurements of a block /c-sparse vector cq. If D satisfies the block RIP (|2TI ) with 
d2k < V^ - 1 then 

1) there is a unique block- /c sparse vector c consistent with y; 

2) the SOCP (I27] ) has a unique solution; 

3) the solution to the SOCP is equal to cq. 

Before proving the theorem we note that it provides a gain over standard CS results. Specifically, it is shown in 
[14] that if c is /c-sparse and the measurement matrix D satisfies the standard RIP with 52k < V^ — 1> then c can 
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be recovered exactly from the measurements y = Dc via the linear program: 

min ||c||i 

c 

s.t. y = Dc. (28) 

Since any block A;-sparse vector is also M-sparse with M equal to the sum of the k largest values of di, we can 

find Co of Theorem [1] by solving ( [28] ) if 62M is small enough. However, this standard CS approach does not exploit 

the fact that the non-zero values appear in blocks, and not in arbitrary locations within the vector cq. On the other 

hand, the SOCP (|27l ) explicitly takes the block structure of cq into account. Therefore, the condition of Theorem [T] 

is not as stringent as that obtained by using equivalence results with respect to ( [28] ). Indeed, the block RIP ([2T] ) 

bounds the norm of ||Dc|| over block sparse vectors c, while the standard RIP considers all possible choices of c, 

also those that are not 2A;-block sparse. Therefore, the value of 62k in ([2T]) can be lower than that obtained from 

([20] ) with k = 2M, as we illustrated by an example in Section [HI] This advantage will also be seen in the context 

of a concrete example at the end of the section. 

Our proof below is rooted in that of [14]. However, some essential modifications are necessary in order to adapt 

the results to the block-sparse case. These differences are a result of the fact that our algorithm relies on the mixed 

£2/^1 norm rather than the £1 norm alone. This adds another layer of complication to the proof, and therefore we 

expand the derivations in more detail than in [14]. 

Proof: We first note that 62k < 1 guarantees uniqueness of cq from Proposition [2 To prove parts 2) and 3) 

we show that any solution to ([26] ) has to be equal to Cq. To this end let c' = Cq + h be a solution of ([26] ). The true 

value Co is non-zero over at most k blocks. We denote by Zq the block indices for which cq is nonzero, and by 

hxg the restriction of h to these blocks. Next we decompose h as 

i-i 
h = J^hx,, (29) 

i=0 

where hj. is the restriction of h to the set 2j which consists of k blocks, chosen such that the norm of hj-^ over Ti 
is largest, the norm over X2 is second largest and so on. Our goal is to show that h = 0. We prove this by noting 
that 

||h||2 = ||hioUXi + h(i^uJi)-ll2 < llhjoUXilb + ||h(ioUXi)-l|2- (30) 

In the first part of the proof we show that ||h(X|^ux^)c||2 < Hhj^uXilb- In the second part we establish that 
llhxnUXilb = 0, which completes the proof. 

Part I:\\h(^j^^yjj^)a\\2 < Hhi^uxJb 
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We begin by noting that 



|h 



{JoUXi)H|2 



i=2 



i-1 
2 i=2 



(31) 



Therefore, it is sufficient to bound llbj 11 2 for i > 2. Now, 



hrMo < k'^^WhrAUr < k-'/^\\h 



lXi_il|2,X, 



(32) 



where we defined ||a||oo,x = maxj ||a[z]||2. The first inequality follows from the fact that for any block /c-sparse c. 



\i\=k 



(33) 



The second inequality in (132] ) is a result of the fact that the norm of each block in hj. is by definition smaller 
or equal to the norm of each block in hj^ j. Since there are at most k nonzero blocks, A;||hxJ|oo,x < ||hi^_i ||2,x- 
Substituting ^ into (ISTT) . 

£-2 i-i 



|h(XoUXO^Il2 < A:-^/2^ ||hxj|2,x < A:-i/2 j; ||hxj|2,x = k~'/^hj.\\2,i, 



(34) 



i=l i=l 

where the equality is a result of the fact that ||ci + C2 ||2,x = ||ci ||2,x + ||c2 ||2,x if ci and C2 are non-zero on disjoint 
blocks. 

To develop a bound on ||hxc||2x note that since c' is a solution to (l26l) . ||co||2,x > ||c'||2,x- Using the fact that 
c' = Co + hx„ + hxj'j and cq is supported on Zq we have 



|co||2,X > ||C0 + hXol|2,X + ||hXnH|2,X > l|co||2,X — ||hXo||2,X + Hhx^Hb.X, 



from which we conclude that 



|hx5l|2,x < ||hxol|2,x < A:^/^||hxJ|2. 



(35) 



(36) 



The last inequality follows from applying Cauchy-Schwarz to any block fc-sparse vector c: 

\i\=k 



|h(XoUXi)-l|2 < llhxolb < llhxoUXilb, 



Substituting (136]) into (134] 



(37) 



(38) 



which completes the first part of the proof. 
Part II:\\hx„uI^h = 
We next show that hxoUXi rnust be equal to 0. In this part we invoke the RIP. 
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Since Dcq = Dc' = y, we have Dh = 0. Using the fact that h = hj^uJi + J2i>2 ^it^ 

e-i 
||Dhx„ux, \\l = -Yl (D(hxo + hxj, Dhx.). (39) 

From the parallelogram identity and the block-RIP it can be shown that 

|(Dci,DC2)| <(^2fc||ci||2||c2||2, (40) 

for any two block A;-sparse vectors with disjoint support. The proof is similar to [14, Lemma 2.1] for the standard 
RIP. Therefore, 

|(Dhx„,DhjJ| < 52fc||hxo||2||hxJ|2, (41) 

and similarly for (Dhj^jDhiJ. Substituting into (|39l ). 

e-i 
||Dhx„uxJll = ^(D(hx„ + hxJ,DhxJ 

1=2 

e-1 

< j;(KDhx„,Dhx.)| + KDhx,,DhxJ|) 

j=2 

e-1 

< (^2fc(||hxJ|2 + ||hxJ|2)^||hjJ|2. (42) 

i=2 

From the Cauchy-Schwarz inequality, any length-2 vector a satisfies a(l) + a(2) < \/2||a||2. Therefore, 

llhxolb + ||hxj|2 < \/2^||hx„ Hi + ||hx, 111 = \/2||hx,uJ, lb, (43) 



where the last equality is a result of the fact that hx^ and hj^ have disjoint support. Substituting into (142]) and 
using (I32]l, dMll and ^, 



|DhxoUXill2 < v2A; ' hkWiMxA^V^x-h;! 



< V2(52fc||hXoUXi||2||hXo||2 

< ^/2,52fc||hxoUxJl2, (44) 
where the last inequality follows from ||hxo||2 < HhxoUJilb- Combining (|44] ) with the RIP (|2T| | we have 

(1 - <52fc)||hxoUxJ|i < ||Dhx„uxJli < V262k\\hi,uiA\l (45) 

Since 52k < V^ — 1, ( l45] ) can hold only if ||hxoUJi II2 = 0, which completes the proof. ■ 

We conclude this subsection by pointing out more explicitly the differences between the proof of Theorem [1] and 

that of [14]. The main difference begins in (l32l) : in our formulation each of the subvectors hx, may have a different 



16 

number of non-zero elements, while the equivalent equation in [14] (Eq. (10)) relies on the fact that the maximal 
number of non-zero elements in each of the subvectors is the same. This requires the use of several mixed-norms 
in our setting. The rest of the proof follows the spirit of [14] where in some of the inequalities conventional norms 
are used, while in others the adaptation to our setting necessitates mixed norms. 

B. Robust Recovery 

We now treat the situation in which the observations are noisy, and the vector cq is not exactly block-A; sparse. 
Specifically, suppose that the measurements (IT6l) are corrupted by bounded noise so that 

y = Dc + z, (46) 

where ||z||2 < e. In order to recover c we use the modified SOCP: 

min ||c||2J 

c ' 

s.t. ||y- Delia <e. (47) 

In addition, given a c G R^, we denote by c^ the best approximation of c by a vector with k non-zero blocks, so 
that c^ minimizes ||c — d||2,j over all block /c-sparse vectors d. Theorem [2] shows that even when c is not block 
/c-sparse and the measurements are noisy, the best block-A; approximation can be well approximated using ( |47l ). 

Theorem 2: Let y = Dcq + z be noisy measurements of a vector cq. Let c^ denote the best block A;-sparse 
approximation of cq, such that c^ is block A;-sparse and minimizes ||co — d||2,j over all block A;-sparse vectors d, 
and let c' be a solution to (|47l). If D satisfies the block RIP dlT]) with 52fc < \/2 - 1 then 

11^ ^'11 2{l-52k) , -1/211 fcii , 4V1 + 52k ,.„. 

Co — C 2 < 1= k ' Cn — C 2XH 1= £• (48) 

'"- 1-(1 + V2)<52fc "° "• 1 - (1 + x/2),52fc 

Before proving the theorem, note that the first term in (1481 ) is a result of the fact that cq is not exactly fc-block 

sparse. The second expression quantifies the recovery error due to the noise. 

Proof: The proof is very similar to that of Theorem [J with a few differences which we indicate. These changes 

follow the proof of [14, Theorem 1.3], with appropriate modifications to address the mixed norm. 

Denote by c' = cq + h the solution to (|47] ). Due to the noise and the fact that cq is not block /c-sparse, we 
wiU no longer obtain h = 0. However, we will show that ||h||2 is bounded. To this end, we begin as in the proof 
of Theorem [T] by using (l30l ). In the first part of the proof we show that ||h(x^uj^)c||2 < Hhj^uJilb + 2eo where 
eo = A;^^'^||co — cx,J|2,i and cj,, is the restriction of cq onto the k blocks corresponding to the largest £2 norm. 
Note that cj^ = c'^. In the second part, we develop a bound on ||hiQUXi lb- 

Parti: Bound on \^{Xq-^sz^y\\2 
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We begin by decomposing h as in the proof of Theorem \T\ The inequaUties until ( [35] ) hold here as well. Instead 
of (|35]) we have 



||co||2,x > ||cio + hxo||2,X + ||cx= + hjc||2_j > ||cXo||2,X - ||hjJ|2,X + ||hxo||2,X - ||cxc||2,x. (49) 

Therefore, 

||hjc||2,i < 2||cxc||2_x + ||hij|2,j, (50) 

where we used the fact that ||co||2,x — ||cjo||2,j = ||cjj||2,j- Combining (l34l) . (l37l ) and (l50l ) we have 

l|h{joUXi)<=ll2 < ||hxJ|2 + 2eo < ||hj„uJi II2 + 2eo, (51) 

where eo = k-'^/'^\\co - cij|2,j. 

Part II: Bound on ||hx„uXj II2 

Using the fact that h = hj^uXi + X]i>2 ^^2:, we have 

(.-1 
||Dhx„uxJli = (Dhx„ux,,Dh) - Y, (D(hx« + hxJ,Dhi.). (52) 

i=2 

From dlTT ). 

|(Dhx„ux,,Dh)| < ||Dhx,uxJ|2||Dh||2 < ^1 + 52fc||hx„uxJ|2||Dh||2. (53) 

Since both c' and cq are feasible 

||Dh||2 = ||D(co - c')||2 < IIDco - y||2 + ||Dc' - y||2 < 2e, (54) 

and (l53l) becomes 



|(Dhj„ux,,Dh)| < 2eVr+^l|hx„uxJ|2. (55) 

Substituting into (l52l ). 

i-i 
llDhxoUxJii < KDhxoUX.,Dh)|+J]|(D(hxo+hxJ,Dhx.)| 

i=2 

< 2eyrT^||hx„uxJ|2 + 5^KD(hx„+hxJ,DhxJ|. (56) 

i=2 

Combining with (02]) and (04]), 

||Dhj„uxJl2 < (2e^l + 62k + V252kk-^^^\\hi^h,x) llhxouxjb- (57) 



Using (|37] ) and (|50l ) we have the upper bound 



|Dhi„ujJli < (2e Vl + ^2fc + \/2(52fc(||hx„ || + 2eo)) Hhj^ujJb- 



On the other hand, the RIP results in the lower bound 
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(58) 



|Dhx„uxJli>(l-<^2fc)||hx„uxJli. 



(59) 



From (ESI) and §9]), 



(1 -(52fc)||hx„ujJ|2 < 2e7rT^+ V252fc(||hx„uxJ| +2eo), 



(60) 



or 



|hx, 



"0UJ1II2 



< 



2VTT5^ 



-e + 



2V26: 



2k 



-eo- 



1 - (1 + V2)52k 1 - (1 + V2)52k 
The condition 62k < V^ — 1 ensures that the denominator in (|6TI ) is positive. Substituting (|6TI ) results in 

||h||2 < llhxnUJilb + ||h(XoUJi)<^l|2 < 2||hxnUJi lb + 2eo, 



(61) 



(62) 



which completes the proof of the theorem. ■ 

To summarize this section we have seen that as long as D satisfies the block-RIP (|2T]) with a suitable constant, 
any block-A; sparse vector can be perfectly recovered from its samples y = Dc using the convex SOCP (l26l ). This 
algorithm is stable in the sense that by slightly modifying it as in (l47l) it can tolerate noise in a way that ensures 
that the norm of the recovery error is bounded by the noise level. Furthermore, if c is not block fc-sparse, then its 
best block-sparse approximation can be approached by solving the SOCP. These results are summarized in Table J] 
In the table, 62k refers to the block RIP constant. 

TABLE I 

Comparison of algorithms for signal recovery from y = Dco + z 





Algorithm (1261l 


Algorithm (147 b 


Co 


block A;-sparse 


arbitrary 


Noise z 


none (z = 0) 


bounded z 2 < e 


Condition on D 


S2k < \/2 - 1 


62k < V2 - 1 


Recovery c' 


c' = Co 


cn — c' 9 small; see (|481l 



C. Advantage of Block Sparsity 

The standard sparsity model considered in CS assumes that x has at most k non-zero elements, however it does 
not impose any further structure. In particular, the non-zero components can appear anywhere in the vector. There 
are many practical scenarios in which the non-zero values are aligned to blocks, meaning they appear in regions. 
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and are not arbitrarily spread throughout the vector. One example in the structured union of subspaces model we 
treat in this paper. Other examples are considered in [25]. 

Prior work on recovery of block-sparse vectors [24] assumed consecutive blocks of the same size. It was sown that 
in this case, when n, A^ go to infinity, the algorithm (l26l) will recover the true block-sparse vector with overwhelming 
probability. Their analysis is based on characterization of the null space of D. In contrast, our approach relies on 
RIP which allows the derivation of uniqueness and equivalence conditions for finite dimensions and not only in 
the asymptotic regime. In addition. Theorem |2] considers the case of mismodelling and noisy observations while in 
[24] only the ideal noise-free setting is treated. 

To demonstrate the advantage of our algorithm over standard basis pursuit (|28] ). consider the matrix D of (l22l ). In 
Section |Vl the standard and block RIP constants of D were calculated and it was shown that block RIP constants are 
smaller. This suggests that there are input vectors x for which the mixed ^2/^1 method of ( |26l ) will be able to recover 
them exactly from measurements y = Dc while standard £1 minimization will fail. To illustrate this behavior, let 
X = [0,0, 1, —1, —1,0.1]^ be a 4-sparse vector, in which the non-zero elements are known to appear in blocks of 
length 2. The prior knowledge that x is 4-sparse is not sufficient to determine x from y. In contrast, there is a unique 
block-sparse vector consistent with y. Furthermore, our algorithm which is a relaxed version of (1231 ). finds the correct 
X while standard £1 minimization fails in this case; its output is x = [-0.0289,0,0.9134, -1.0289, -1.0289,0]. 

We further compare the recovery performance of £1 minimization (l28l ) and our algorithm (l26l ) for an extensive 
set of random signals. In the experiment, we draw a matrix D of size 25 x 50 from the Gaussian ensemble. The 
input vector x is also randomly generated as a block-sparse vector with blocks of length 5. We draw 1 < k < 25 
non-zero entries from a zero-mean unit variance normal distribution and divide them into blocks which are chosen 
uniformly at random within x. Each of the algorithms is executed based on the measurements y = Dx. In Fig. |2]we 
plot the fraction of successful reconstructions for each k over 500 experiments. The results illustrate the advantage 
of incorporating the block-sparsity structure into the optimization program. An interesting feature of the graph is 
that when using the block-sparse recovery approach, the performance is roughly constant over the block-length (5 
in this example). This explains the performance advantage over standard sparse recovery. 

VI. Application to MMV Models 

We now specialize our algorithm and equivalence results to the MMV problem. This leads to two contributions 
which we discuss in this section: The first is an equivalence result based on RIP for a mixed-norm MMV algorithm. 
The second is a new measurement strategy in MMV problems that leads to improved performance over conventional 
MMV methods, both in simulations and as measured by the RIP-based equivalence condition. In contrast to previous 
equivalence results, for this strategy we show that even if we choose the worst possible X, improved performance 
over the single measurement setting can be guaranteed. 
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Fig. 2. Recovery rate of block-sparse signals using standard (.\ minimization (basis pursuit) and the mixed ^"2/^1 algorithm. 



A. Equivalence Results 

As we have seen in Section lUl a special case of block sparsity is the MMV model, in which we are given 
a matrix of measurements Y = MX where X is an unknown L x d matrix that has at most k non-zero rows. 
Denoting by c = vec(X-^), y = vec(Y^), D = M-'" (g) I^ we can express the vector of measurements y as y = Dc 
where c is a block sparse vector with consecutive blocks of length d. Therefore, the results of Theorems [T] and |2] 
can be specified to this problem. 

Recovery algorithms for MMV using convex optimization programs were studied in [28], [30] and several greedy 
algorithms were proposed in [27], [29]. Specifically, in [27]-[30] the authors study a class of optimization programs, 
which we refer to as M-BP: 

L 



M-BP(£g): mm^\\X.'\\P s.t. Y = ]V[X, 



(63) 



where X* is the ith row of X. The choice p = l,q = 00 was considered in [30], while [28] treated the case of 
p = 1 and arbitrary q. Using p < 1 and q = 2 was suggested in [27], [41], leading to the iterative algorithm 
M-FOCUSS. For p = l,q = 2, the program (1631 ) has a global minimum which M-FOCUSS is proven to find. A 
nice comparison between these methods can be found in [30]. Equivalence for MMV algorithms based on RIP 
analysis does not appear in previous papers. The most detailed theoretical analysis can be found in [28] which 
establishes equivalence results based on mutual coherence. The results imply equivalence for (1631 ) with p = I under 
conditions equal to those obtained for the single measurement case. Note that RIP analysis typically leads to tighter 
equivalence bounds than mutual coherence analysis. 

In our recent work [19], we suggested an alternative approach to solving MMV problems by merging the d 
measurement columns with random coefficients and in such a way transforming the multiple measurement problem 
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into a single measurement counterpart. As proved in [19], this technique preserves the non-zero location set with 
probability one thus reducing computational complexity. Moreover, we showed that this method can be used to 
boost the empirical recovery rate by repeating the random merging several times. 

Using the block-sparsity approach we can alternatively cast any MMV model as a single measurement vector 
problem by deterministic ally transforming the multiple measurement vectors into the single vector model vec(Y^) = 
(M (gi Irf) vec(X-^), where c = vec(X-^) is block-A; sparse with consecutive blocks of length d. In contrast to [19] 
this does not reduce the number of unknowns so that the computational complexity of the resulting algorithm is 
on the same order as previous approaches, and also does not offer the opportunity for boosting. However, as we 
see in the next subsection, with an appropriate choice of measurement matrix this approach results in improved 
recovery capabilities. 

Since we can cast the MMV problem as one of block-sparse recovery, we may apply our equivalence results 
of Theorem [U to this setting leading to RIP-based equivalence. To this end we first note that applying the SOCP 
(l26l) to the effective measurement vector y is the same as solving (l63l) with p = l,q = 2. Thus the equivalence 
conditions we develop below relate to this program. Next, if z = Dc where c is a block 2A;-sparse vector and 
D = M (g) I(^, then taking the structure of D into account, Z = MX where X is a size L x d matrix whose ith 
row is equal to c[i], and similarly for Z. The block sparsity of c implies that X has at most 2k non-zero rows. 
The squared £2 norm ||z||2 is equal to the squared £2 norm of the rows of Z which can be written as 

Ml = \\Zfp = Tt{Z^Z), (64) 

where ||Z||p denotes the Frobenius norm. Since ||c||o = ||X||p the RIP condition becomes 

(1 - 62k) Tr(X^X) < Tr(X^M^MX) < (1 + ^2^) TV(X^X), (65) 

for any L x d matrix X with at most 2k non-zero rows. 

We now show that ( [65] ) is equivalent to the standard RIP condition 

(1 - S2k)Ml < l|Mx||2 < (1 + 62k)Ml (66) 



for any length L vector x that is 2A;-sparse. To see this, suppose first that (I65l ) is satisfied for every matrix X with at 
most 2k non-zero rows and let x be an arbitrary 2A;-sparse vector. If we define X to be the matrix whose columns 
are all equal to x, then X will have at most 2k non-zero rows and therefore satisfies (1651 ). Since the columns of 
X are all equal, Tr(X^X) = d||x||| and Tr(X^]V[^]V[X) = (i||Mx||^ so that §6^ holds. Conversely, suppose 
that (l66l ) is satisfied for all 2A;-sparse vectors x and let X be an arbitrary matrix with at most 2k non-zero rows. 
Denoting by Xj the columns of X, each Xj is 2/c-sparse and therefore satisfies (l66l l. Summing over all values j 
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results in ([65]). 

To summarize, if M satisfies the conventional RIP condition (l66l ). then the algorithm (l63l ) with p = l,q = 2 
will recover the true unknown X. This requirement reduces to that we would obtain if we tried to recover each 
column of X separately, using the standard ii approach (|28] ). As we already noted, previous equivalence results 
for MMV algorithms also share this feature. Although this condition guarantees that processing the vectors jointly 
does not harm the recovery ability, in practice exploiting the joint sparsity pattern of X via ( [63] ) leads to improved 
results. Unfortunately, this behavior is not captured by any of the known equivalence conditions. This is due to the 
special structure of D = M (g) I. Since each measurement vector y^ is affected only by the corresponding vector 
Xj, it is clear that in the worst-case we can choose Xj = x for some vector x. In this case, all the yjS are equal 
so that adding measurement vectors will not improve our recovery ability. Consequently, worst-case analysis based 
on the standard measurement model for MMV problems cannot lead to improved performance over the single 
measurement case. 

B. Improved MMV Recovery 

We have seen that the pessimistic equivalence results for MMV algorithms is a consequence of the fact that in 
the worst-case scenario in which Xj = x, using a separable measurement strategy will render all observation vectors 
equal. In this subsection we introduce an alternative measurement technique for MMV problems that can lead to 
improved worst-case behavior, as measured by RIP, over the single channel case. 

One way to improve the analytical results is to consider an average case analysis instead of a worst-case approach. 
In [42] we show that if the unknown vectors Xj are generated randomly, then the performance improves with 
increasing number of measurement vectors. The advantage stems from the fact that the situation of equal vectors 
has zero probability and therefore does not affect the average performance. Here we take a different route which 
does not involve randomness in the unknown vectors, and leads to improved results even in the worst-case (namely 
without requiring an average analysis). 

To enhance the performance of MMV recovery, we note that when we allow for an arbitrary (unstructured) D, 
the RIP condition of Theorem [T] is weaker than the standard RIP requirement for recovering /c-sparse vectors. This 
suggests that we can improve the performance of MMV methods by converting the problem into a general block 
sparsity problem, and then sampling with an arbitrary unstructured matrix D rather than the choice D = M^ ® I^. 
The tradeoff introduced is increased computational complexity since each measurement is based on all input vectors. 
The theoretical conditions will now be looser, since block-RIP is weaker than standard RIP. Furthermore, in practice, 
this approach often improves the performance over separable MMV measurement techniques as we illustrate in the 
following example. 

In the example, we compare the performance of several MMV algorithms for recovering X in the model Y = 
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Fig. 3. Recovery rate for different number k of non-zero rows in X. Each point on the graph represents an average recovery rate over 500 
simulations. 



MX, with our method based on block sparsity in which the measurements y are obtained via y = Dc where 
c = vec(X^) and D is a dense matrix. Choosing D as a block diagonal matrix with blocks equal to M results in 
the standard MMV measurement model. The effective matrices D have the same size in the case in which it is block 
diagonal and when it is dense. To compare the performance of (|26l ) with a dense D to that of ( [63] ) with a block 
diagonal D, we compute the empirical recovery rate of the methods in the same way performed in [19]. The matrices 
M and D are drawn randomly from a Gaussian ensemble. In our example, we choose ^ = 20, L = 30, d = 5 where 
i is the number of rows in Y. The matrix X is generated randomly by first selecting the k non-zero rows uniformly 
at random, and then drawing the elements in these rows from a normal distribution. The empirical recovery rates 
using the methods of ([63] ) for different choices of q and p, ReMBO [19] and our algorithm ([26] ) with dense D are 
depicted in Fig. [3] When the index p is omitted it is equal to 1. Evidently, our algorithm performs better than most 
popular optimization techniques for MMV systems. We stress that the performance advantage is due to the joint 
measurement process rather than a new recovery algorithm. 



VII. Random Matrices 

Theorems [I] and [2] establish that a sufficiently small block RIP constant 62k\i ensures exact recovery of the 
coefficient vector c. We now prove that random matrices are likely to satisfy this requirement. Specifically, we 
show that the probability that 6k\j exceeds a certain threshold decays exponentially in the length of c. Our approach 
relies on results of [12], [26] developed for standard RIP, however, exploiting the block structure of c leads to a 
much faster decay rate. 

Proposition 3: Suppose D is an n x A^ matrix from the Gaussian ensemble, namely [T)]ik ~ AA(0, yj. Let 6k\T 
be the smallest value satisfying the block RIP ([2T] ) over T = {di = d, . . . ,dm = d}, assuming A^ = md for some 
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integer m. Then, for every e > the block RIP constant 6k\j obeys (for n, A^ large enough, and fixed d) 

Prob (Jl + 5k\j > 1 + (1 + e)/(r)) < 26"^^^^ • e-'^^'^-^Wr) . (67) 



Here, the ratio r = kd/N is fixed, /(r) = \/ ^ ( V^+ \/'^H{r) ), and i:/^(g) = — glogg— (1 — (?) log(l — g) is the 
entropy function defined for < g < 1. 

The assumption that di = d simplifies the calculations in the proof. Following the proof, we shortly address the 
more difficult case in which the blocks have varying lengths. We note that Proposition |3] reduces to the result of 
[12] when d = I. However, since /(r) is independent of d, it follows that for d > 1 and fixed problem dimensions 
n, N, r, block-RIP constants are smaller than the standard RIP constant. The second exponent in the right-hand 
side of (|67] ) is responsible for this behavior. 
Proof: Let A = (1 + e)f{r) and define 

CT = max o-max(DT), 01= mill a^inCDT), (68) 

\T\=k,d \T\=k,d 

where (Tmax(DT), o"min(DT)i are the largest and the smallest singular values of Dy, respectively. We use \T\ = k,d 
to denote a column subset of D consisting of k blocks of length d. For brevity we omit subscripts and denote 
6 = 6j.n- The inequalities in the definition of block-RIP (|2TI) imply that 

1 + 5 >a^ (69) 

l-5<a^. (70) 

Since 6 is the smallest number satisfying these inequalities we have that 1 + 6 = max(CT^, 2 — g^). Therefore, 



Prob(Vl + 5> 1 + A) =Prob(\/max(a2,2-o:2) > 1 + A) (71) 



< Prob(a > 1 + A) + Prob(\/2 - ^2 > 1 + A). (72) 



Noting that a > 1 — A impUes y^2 — a^ < 1 + A we conclude that 



Prob {Vl + S> 1 + A < Prob(CT > 1 + A) + Prob(o: < 1 - A). (73) 



We now bound each term in the right-hand-side of (1731 ) using a result of Davidson and Szarek [43] regarding 
the concentration of the extreme singular values of a Gaussian matrix. It was proved in [43] that an m x n matrix 
X with n> m satisfies 



Prob(a„ax(X) > 1 + y;^ + t)< e-"*'/2 (74) 

Prob(amin(X) < 1 - 7^ -t)< e^"*'/^ (75) 
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Applying a union bound leads to 
Prob I CT > 1 



V^+M^ Yl Prob[cJn,ax(DT)>l + A/^ + t] (76) 

"■ / \T\=k,d \ n J 

|T|=A;,d 

"Ae-^y^. (78) 

Using the well-known bound on the binomial coefficient (for sufficiently large m) 

'^\ ^^mH{k/m)^ (79) 

we conclude that 

Prob [ a > 1 + J^ +t\ < ernH(klm)^-nei2_ (gg^ 

To utilize this result in (1731 ) we rearrange 

1 + A = 1 + (1 + £)/(,•) (81) 



1 + (1 + .) J^ + J^ffM (82) 

\ V n V n / 



>l + l/- + J(l + e)— /7(r) (83) 

V n V n 



and obtain that 



Prob (a > 1 + A) < Prob ( a > 1 + \\— + J (1 + e)—H{r) ] . (84) 

\ V n V n I 



Using dSOll leads to 



77-/7 / \ nil-leyiN Hir) 

Prob (a > 1 + A) < e"^^(fe/'")e- 2,. ' (85) 

= giVH(r)-m(d-l)H(r)-(l+£)7VH{r) (-gg^ 

< g-ArH(r)eg-m(d-l)H{r)_ ^^^^ 

Similar arguments are used to bound the second term in (1731 ). completing the proof. ■ 

The proof of Proposition [3] can be adapted to the case in which di are not equal. In this case, the notation 
|r| = A:, d is replaced by |r| = k\T and has the following meaning: T indicates a column subset of D consisting of 
k blocks from T. Since T contains variable-length blocks, \T\ is not constant and depends on the particular column 
subset. Consequently, in order to apply the union bounds in (1761 ) we need to consider the worst-case scenario 
corresponding to the maximal block length in T. Proposition |3] thus holds for d = max((ij). However, it is clear 
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xlO" 



Fig. 4. The upper bound on S^^j as a function of the sparsity ratio r, for three sampling rates n/N, and three block structures d — 1,5, 20. 
The horizontal threshold is fixed on p* = y/2 — 1 representing the threshold for equivalence derived in Theorem [T] 



that the resulting probabiUty bound will not be as stringent as in the case of equal di = d, especially when the 
ratio max ( dj )/ mill (dj) is large. 

Proposition [3] holds as is for matrices D from the Bernoulli ensemble, namely ['D]ik = ±-^ with equal 
probability. In fact, the proposition is true for any ensemble for which the concentration of extreme singular 
values holds. 

The following corollary emphasizes the asymptotic behavior of block-RIP constants per given number of samples. 

Corollary 3: Consider the setting of Proposition [3l and define g{r) = y ^ ( V^+ \/'2^H{r)d^^ J. Then, 



Prob (^1 + 5k\i > 1 + (1 + e)g{T)) < 2e-'"^W^ 
Proof: Let A = (1 + e)g{r). The result then follows by replacing ([8T|)-(l83l) with 

^kd 



1 + A> 1 + 



n+V(^ + ^)7^^(^)' 



(88) 



(89) 



which leads to Prob(CT > 1 + A) < e-'"^('■)^ ■ 

To evaluate the asymptotic behavior of block-RIP we note that for every e > the right-hand side of (l88l ) goes 
to zero when N = md —>■ cxd. Consequently, for fixed d 



,A 



dk\i<p{r)=-l+[l+g{r)Y, 



(90) 



with overwhelming probability. In Fig. |4] we compute p{r) for several problem dimensions and compare it with 
standard RIP which is obtained when d = I. Evidently, as the non-zero entries are forced to block structure, a 
wider range of sparsity ratios r satisfy the condition of Theorem [T] 
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Fig. 5. The standard and block-RIP constants S/^^j for three different dimensions n, N. Each graph represent an average over 10 instances 
of random matrix D. Each instance of D is scaled by a factor such that dlSt is satisfied with a + P — 2. 



Although Fig. m shows advantage for block-RIP, the absolute sparsity ratios predicted by the theory are pessimistic 
as also noted in [12], [26] in the case of d = 1. To offer a more optimistic viewpoint, the RIP and block-RIP 
constants were computed brute-force for several instances of D from the Gaussian ensemble. Fig. |5] plots the 
results and qualitatively affirms that block-RIP constants are more "likely" to be smaller than their standard RIP 
counterparts, even when the dimensions n, A^ are relatively small. 

An important question is how many samples are needed roughly in order to guarantee stable recovery. This 
question is addressed in the following proposition, which quotes a result from [44] based on the proofs of [45]; 
we rephrase the result to match our notation. 

Proposition 4 ( [44, Theorem 3.3]): Consider the setting of Proposition [3l namely a random Gaussian matrix D 
of size n X N and block sparse signals over I = {di = d, . . . ,d„t = d}, where N = md for some integer m. Let 
t > and < (5 < 1 be constant numbers. If 



n 



>^(ln(2L)+fc<iln(|)+t), 



(91) 



where L = C^), then D satisfies the block-RIP (|2TI) with restricted isometry constant 6}^\j = 6, with probability at 
least 1 — e~*. 

As observed in [44], the first term in (l9Tt has the dominant impact on the required number of measurements in 
an asymptotic sense. Specifically, for block sparse signals 



(m/k)'' <L= C^) <{em/kY 



(92) 



Thus, for a given fraction of nonzeros r = kd/N, roughly n ss klog{m/k) = —klog{r) measurements are needed. 
For comparison, to satisfy the standard RIP a larger number n « —kdlog{r) is required. Note that Corollary |4] puts 
the emphasis on the required problem dimensions to satisfy a given RIP level. In contrast. Proposition [3] provides 
a tail bound on the expected isometry constant for given problem dimensions. 
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VIII. Conclusion 

In this paper, we studied the problem of recovering an unknown signal x in an arbitrary Hilbert space 7i, from 
a given set of n samples which are modelled as inner products of x with sampling functions Si,l < i < n. The 
signal X is known to lie in a union of subspaces, so that x G Vj where each of the subspaces Vi is a sum of k 
subspaces Ai chosen from an ensemble of m possibilities. Thus, there are (™) possible subspaces in which x can 
lie, and a-priori we do not know which subspace is the true one. While previous treatments of this model considered 
invertibility conditions, here we provide concrete recovery algorithms for a signal over a structured union. 

We began by showing that recovering x can be reduced to a sparsity problem in which the goal is to recover 
a block-sparse vector c from measurements y = Dc where the non-zero values in c are grouped into blocks. 
The measurement matrix D is equal to S* A where S* is the sampling operator and ^ is a set transformation 
corresponding to a basis for the sum of all Ai. To determine c we suggested a mixed £2/^1 convex optimization 
program that takes on the form of an SOCP. Relying on the notion of block-RIP, we developed sufficient conditions 
under which c can be perfectly recovered using the proposed algorithm. We also proved that under the same 
conditions, the unknown c can be stably approximated in the presence of noise. Furthermore, if c is not exactly 
block-sparse, then its best block-sparse approximation can be approached using the proposed method. We then 
showed that when D is chosen at random, the recovery conditions are satisfied with high probability. 

Specializing the results to MMV systems, we proposed a new method for sampling in MMV problems. In this 
approach each measurement vector depends on all the unknown vectors. As we showed, this can lead to better 
recovery rate. Furthermore, we established equivalence results for a class of MMV algorithms based on RIP. 

Throughout the paper, we assumed a finite union of subspaces as well as finite dimension of the underlying spaces. 
An interesting future direction to explore is the extension of the ideas developed herein to the more challenging 
problem of recovering x in a possibly infinite union of subspaces, which are not necessarily finite-dimensional. 
Although at first sight this seems like a difficult problem as our algorithms are inherently finite-dimensional, recovery 
methods for sparse signals in infinite dimensions have been addressed in some of our previous work [15]-[19]. In 
particular, we have shown that a signal lying in a union of shift-invariant subspaces can be recovered efficiently 
from certain sets of sampling functions. In our future work, we intend to combine these results with those in the 
current paper in order to develop a more general theory for recovery from a union of subspaces. 

A recent preprint [46] that was posted online after the submission of this paper proposes a new framework called 
model-based compressive sensing (MCS). The MCS approach assumes a vector signal model in which only certain 
predefined sparsity patterns may appear. In general, obtaining efficient recovery algorithms in such scenarios is 
difficult, unless further structure is imposed on the sparsity patterns. Therefore, the authors consider two types of 
sparse vectors: block sparsity as treated here, and a wavelet tree model. For these settings, they generalize two 
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known greedy algorithms: CoSaMP [47] and iterative hard thresholding (IHT) [44]. These results emphasize our 
claim that theoretical questions of uniqueness and stable representation can be studied for arbitrary unions as in 
[23]. However tractable recovery algorithms inherently require some structure, as the one considered here. 

The union model developed in this paper is broader than the block-sparse setting treated in [46] in the sense 
that it allows to model linear dependencies between the nonzero values rather than only between their locations, by 
appropriate choice of subspaces in ^, ([7]). In addition, we aim at optimization-based recovery algorithms (I26l).(l47l) 
which require selecting the objective in order to promote the model properties. Finally, we emphasize that our 
results are non asymptotic and also ensure stable recovery in the presence of noise and signal mismodeling. 
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